Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: add router benchmark program and use it as integration test #4437

Merged
merged 43 commits into from
Nov 17, 2023

Conversation

jiceatscion
Copy link
Contributor

@jiceatscion jiceatscion commented Nov 6, 2023

Summary of changes:

  • Enhance the end2end_integration programs as follows:
    • end2end_integration has a slightly finer-grain role-based pair selection capability.
    • end2end_integration outputs an extra result line that provides precise start/stop timestamps for the whole test.
  • process_metrics exposes the number of cores used by Go.
  • Add end2endblast. Similar to end2end but plays a game other than pingpong: packetblast. It keeps logging and tracing to a minimum. The server side responds to very few of the pings (1/256). Just enough to prove that the circuit works.
  • Add an integration test called router_benchmark:
    • relies on an especially crafted topology: router_bm.topo
    • uses end2end_integration with command end2endblast over that topology as a load test for the router
    • extracts and logs relevant performance metrics after the test
    • compares performances with a set of expectations if executed by CI.

As a benchmark program, this is only a first approximation. Extending the end2end suite provided a quick solution but isn't ideal: it uses too many cores at once due to requiring 5 routers actively running for one of the tests. More work is in progress to replace this with some leaner setup where only one router is actually running.

Fixes: #4410
Fixes: #4411


This change is Reviewable

Miscellaneous facts:
* This load test can be used to benchmark the router.
* The test orchestrator is a simplified variant of the end2end_integration test.
* The leg-work is done by the end2end binary, which has been extended to play other games than ping-pong.
* The test always exercises all available pairs. For the purpose of benchmarking, it relies on a bespoke topology: router_bm.topo, which has just enough to exercise all types of forwarding.
For now the test is not accomplishing much. It runs the benchmark and
collects "some" metrics. The metrics are just printed out in raw form.
A future version will collect upcoming (separate PR) metrics that are
actually relevant to a router benchmark, and compare them with some
expectations.
I needed most of the features in end2end_integration
so there was no benefit in having a separate executable.

It turned out impossible to create a custom topology that
would produce only one type of traffic at one router while
exercising all possible role pairs. One still needs to chose
role pairs to obtain that. Also I couldn't do it with only
the existing subsets. I needed to add noncore#nonlocalcore
and a slightly extended topology.
Also suppress most pongs in the packetflood game, in order to allow
some routers to have only IN or only OUT traffic.
Mainly:
* Added some more flexibility in the end2end_integration subset selection.
* Make use of samein router_benchmark test.
* Suppressed some more tracing under the traces=false option.
* Cache the path to the target during during the packetflood game.
Changes:
* added flowID to ping messages in the router benchmark program.
* fix ping response suppression. Do not suppress one in 256 responses,
  but allow one in 256.
* fixed typoe in log.
* fix metric labeling. The proper interface ID for each metric had been
  lost in a refactoring.
Changes:
* Make server methods take a pointer receiver so pongs count isn't reset for
  each call.
* Formatting and so on.
Mainly:
* Add mertic to expose number of cores used by Go.
* Clarify running/runnable metrics.
* Collect the correct performance data from the test run.
* Fixed the router_bm topo so that there actually are two ISDs.
Have the test driver supply the best metrics time-window to the test harness.
Rely on wait_connectivity rather than the passage of time.
This is in the faint hope of evading a merge conflict.
...Or may be I had already rebased on top of the trunk's fix. Don't know.
@jiceatscion jiceatscion requested a review from a team as a code owner November 6, 2023 18:36
Had to skip the autogenerated docs in rules_openapi/tools as it seems we have no control on those and mdlint doesn't like them anymore.
Checks are only performed during CI. Expectations have been extrapolated from
one initial CI run.
That was the problem.
Just so that I can easily see what the CI environment is like.
Delete edundant log.
Move core count cpature and output to the end.
Five seconds gave too much variability. It seems that pkt rate
and cpu rate don't align very well or that cpu rate is very volatile.
That caused pkts/cpu-sec to have large random spikes. With 10s the spikes
are gone.
Since switching to a 10s window the average observed performance may have
been reduced a bit.
Summary:

* In the router, prevent the Go runtime from trying to use more cores than
  the specified number of processor threads. Reason:
  * Be consistent with the default where there are as many processor threads as
    there are cores.
  * In non-default cases, we want to be able to support multiple routers on the
    same host. When doing that, Go's assumption that all the cores are fair game
	distorts the performance as all router instances compete for the same
	cores and so incurr far more preemption than they would in a normal deployment.
* In the test harness, configure the number of cores allowed to each router
  based on what's available on the host running the test.
* In the end2end binary, terminate the pong drain task quickly instead of
  waiting up to 10s. This allows to report the end-time of the test accurately,
  thereby removing a lot of variability in the results (before, the metrics
  capture window could overlap with the end of the active phase of the test).
* Because the test goes faster, increase the number of packets to 1.5M, so we
  are sure we have a reliable 10s metrics capture window in the middle.
Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 14 files at r1.
Reviewable status: 2 of 15 files reviewed, 6 unresolved discussions (waiting on @jiceatscion)


.buildkite/pipeline_lib.sh line 20 at r2 (raw file):

        if [[ "$test" =~ router_benchmark ]]; then
            args="--test_arg=--ci"
	fi

There should already be an env variable CI which could be used instead of this flag. If I understand the plumbum API correctly, we could even set the flag implicitly based on an envvar using cli.Flag(...., envname="CI").


acceptance/router_benchmark/BUILD.bazel line 8 at r2 (raw file):

    args = [
        "--executable=end2end_integration:$(location //tools/end2end_integration)",
    ],

I'm actually a bit confused how this can even run the ./bin/end2end as a subcommand binary. How is this even built and how come this is on the appropriate path to run here? Usually bazel is such a stickler about this...


acceptance/router_benchmark/test.py line 3 at r2 (raw file):

#!/usr/bin/env python3

# Copyright 2020 Anapaya Systems

Nit copy pasted copyright


pkg/snet/packet_conn.go line 118 at r2 (raw file):

	// FlowID: arbitrary value to distribute connections over router CPUs.
	// Initialize to 1 if backward compatible (no-distribution) behavior is desired.
	FlowID uint32

Could we deal with this Flow ID change in a separate PR, so that this gets the appropriate visibility?

IMO, PacketConn this is not the right API level to expose this. The PacketConn is sort of a "raw" connection. A user of this API can be expected to set the FlowID directly in the Packet.

I would add this to the snet.Conn instead, setting it when creating a packet in scionConnWriter.WriteTo. We can e.g. set this based on a hash of src/dst ISD, AS, Host address and Port (see e.g. https://www.rfc-editor.org/rfc/rfc6437#appendix-A).
I don't think this requires a backwards compatibility layer. From an application perspective, with using a constant FlowID, we would have possibly had an ordering guarantee between packets sent from different snet.Conns. I doubt that this would even have worked in the first place (dispatcher / sending host might not guarantee the send order already). (Ab-)Using this in an application would probably have been tricky enough that I'm sure someone would have noticed that it's a bad idea.


tools/end2end/main.go line 17 at r2 (raw file):

// This is a general purpose client/server code for end2end tests. It plays
// ping-pong with some variantions depending on command line arguments.

I'm not sure about generalizing this end2end thing. It feels to me like this makes things much more complicated than they need to be.

Wouldn't a separate "blaster" be much shorter and simpler than this? We only want run one test at a time, and so we don't seem to benefit much from the end2end_integration harness. The server side becomes trivial if we don't send any replies and just report received packets out of band.

Note that we could also use the end2end_integration harness with a different command than end2end -- it just assumes that some flags are supported -- so this could still be used with a simpler "blaster" test.

I don't know, what do you think?


tools/integration/integrationlib/common.go line 150 at r2 (raw file):

		}
		log.Info("Retrying...")
		time.Sleep(integration.RetryTimeout)

Nit. This is now printing "Retrying..." after the last attempt.

Copy link
Contributor Author

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 15 files reviewed, 2 unresolved discussions (waiting on @matzf)


.buildkite/pipeline_lib.sh line 20 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

There should already be an env variable CI which could be used instead of this flag. If I understand the plumbum API correctly, we could even set the flag implicitly based on an envvar using cli.Flag(...., envname="CI").

Done.
I wasn't aware. Works like a charm.


acceptance/router_benchmark/BUILD.bazel line 8 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

I'm actually a bit confused how this can even run the ./bin/end2end as a subcommand binary. How is this even built and how come this is on the appropriate path to run here? Usually bazel is such a stickler about this...

I honestly don't know. I didn't invent this; I cargo-culted it from trc_update/BUILD.bazel. Apparently the value of "--executable" is actually a mapping: "end2end_integration" -> path_of(tools/end2end_integration). The rest is the business of common/base.py (which builds the map) and /test.py that peruses it. Bazel is content because the executable file is listed as a data dependency of the test.


acceptance/router_benchmark/test.py line 3 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit copy pasted copyright

Done


pkg/snet/packet_conn.go line 118 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

Could we deal with this Flow ID change in a separate PR, so that this gets the appropriate visibility?

IMO, PacketConn this is not the right API level to expose this. The PacketConn is sort of a "raw" connection. A user of this API can be expected to set the FlowID directly in the Packet.

I would add this to the snet.Conn instead, setting it when creating a packet in scionConnWriter.WriteTo. We can e.g. set this based on a hash of src/dst ISD, AS, Host address and Port (see e.g. https://www.rfc-editor.org/rfc/rfc6437#appendix-A).
I don't think this requires a backwards compatibility layer. From an application perspective, with using a constant FlowID, we would have possibly had an ordering guarantee between packets sent from different snet.Conns. I doubt that this would even have worked in the first place (dispatcher / sending host might not guarantee the send order already). (Ab-)Using this in an application would probably have been tricky enough that I'm sure someone would have noticed that it's a bad idea.

I suspected you'd object to such an ad-hoc API change. I just did a couple of runs with and without... the performance difference is actually unknown because I just discovered while deleting the code that I had forgotten to use the passed flowID value when serializing! So, I guess we can live without that until we had time to add the feature properly.


tools/end2end/main.go line 17 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

I'm not sure about generalizing this end2end thing. It feels to me like this makes things much more complicated than they need to be.

Wouldn't a separate "blaster" be much shorter and simpler than this? We only want run one test at a time, and so we don't seem to benefit much from the end2end_integration harness. The server side becomes trivial if we don't send any replies and just report received packets out of band.

Note that we could also use the end2end_integration harness with a different command than end2end -- it just assumes that some flags are supported -- so this could still be used with a simpler "blaster" test.

I don't know, what do you think?

A separate blaster may be a "simpler" solution to execute, but not to code, because end2end is already written. I was about to write a separate tool when I realized that the changes I needed to make to end2end to satisfy my needs where very simple. In the long run I agree with you. We'll eventually make a specialized tool. Not because it's easier but because it can test fewer components at a time (i.e. one), which will give more accurate readings. For now, I'm happy that I could meter all 5 traffic types in two runs with one topology. I find it pretty convenient to get some early results. So, if it's ok with you, I'd prefer to keep this for now and do another round of improvements in a later PR.


tools/integration/integrationlib/common.go line 150 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit. This is now printing "Retrying..." after the last attempt.

Done

Summary:
* Stop abusing the num_processors config variable and add a num_cores config
  for that purpose. (This gives us the flexibility of configuring processors
  and cores independently.
* Adjust the core allocation scheme in the test harness. Gives better results
  and better balance in presence of other than 12 cores.
* Document the new config variable.
Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1, 5 of 9 files at r5, all commit messages.
Reviewable status: 7 of 16 files reviewed, 4 unresolved discussions (waiting on @jiceatscion)


acceptance/router_benchmark/BUILD.bazel line 8 at r2 (raw file):

Previously, jiceatscion wrote…

I honestly don't know. I didn't invent this; I cargo-culted it from trc_update/BUILD.bazel. Apparently the value of "--executable" is actually a mapping: "end2end_integration" -> path_of(tools/end2end_integration). The rest is the business of common/base.py (which builds the map) and /test.py that peruses it. Bazel is content because the executable file is listed as a data dependency of the test.

Yup, I saw that it's the same in trc_update. I do understand the end2end_integration, but I'm confused about end2end which is invoked by end2end_integration. I think this is escapes the bazel sandbox somehow, which probably means that it only works because we build into bin/ before.


doc/manuals/router.rst line 182 at r5 (raw file):

      The send buffer size in bytes. 0 means use system default.

   .. option:: router.num_cores = <int> (Default: ``GOMAXPROCS``)

I would prefer not to create a redundant option to control this. The existing GOMAXPROCS already does what we need. I think it's a great idea to document better that this (and other GO...) environment variables apply.

In the test setup, adding an environment variable is just as easy as editing the toml, it can be set in the docker-compose file.
(Btw, the compose file also has some options to limit cpu usage. AFAIU, setting the cpuset option would result in an affinity mask which the go runtime does take into account for the default GOMAXPROC value. The go runtime does not, however, take into account CPU quotas with cpus value.)


pkg/snet/dispatcher.go line 59 at r5 (raw file):

	ia addr.IA,
	registration *net.UDPAddr,
	svc addr.SVC) (PacketConn, uint16, error) {

Nit: maybe revert this file and snet/packet_conn.go entirely to avoid any git blame and or merge issues.


pkg/snet/packet_conn.go line 118 at r2 (raw file):

Previously, jiceatscion wrote…

I suspected you'd object to such an ad-hoc API change. I just did a couple of runs with and without... the performance difference is actually unknown because I just discovered while deleting the code that I had forgotten to use the passed flowID value when serializing! So, I guess we can live without that until we had time to add the feature properly.

Ok!


tools/end2end/main.go line 17 at r2 (raw file):

A separate blaster may be a "simpler" solution to execute, but not to code, because end2end is already written.

The end2end tool grew by about 40% lines of code to include the packetflood. My gut feeling is that with a copy-pasted and stream-lined version of end2end that only does the "packetflood", we would end up with not much more code in total.
I don't like that this now conflates two mostly separate code paths just to share some of the boilerplate, but I think I don't feel strongly enough about this end2end test to object. Especially, as you already have plans to replace this. @scionproto/scion-core-team opinions?


topology/router_bm.topo line 1 at r5 (raw file):

--- # Bespoke Topology for router benchmarking:

Nit. For other tests that expect specific topology, we put this .topo file into the testdata, i.e. acceptance/router_benchmark/testdata/. Probably a good way to keep things tidy.

Copy link
Contributor Author

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 16 files reviewed, 2 unresolved discussions (waiting on @matzf)


acceptance/router_benchmark/BUILD.bazel line 8 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

Yup, I saw that it's the same in trc_update. I do understand the end2end_integration, but I'm confused about end2end which is invoked by end2end_integration. I think this is escapes the bazel sandbox somehow, which probably means that it only works because we build into bin/ before.

mmm, end2end_integration might need to have bin/end2end as a data dependency too. It seems to be enough. Although I don't know precisely how bazel manages it.


doc/manuals/router.rst line 182 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

I would prefer not to create a redundant option to control this. The existing GOMAXPROCS already does what we need. I think it's a great idea to document better that this (and other GO...) environment variables apply.

In the test setup, adding an environment variable is just as easy as editing the toml, it can be set in the docker-compose file.
(Btw, the compose file also has some options to limit cpu usage. AFAIU, setting the cpuset option would result in an affinity mask which the go runtime does take into account for the default GOMAXPROC value. The go runtime does not, however, take into account CPU quotas with cpus value.)

Well, yes it occured to me that we could use the env var directly. I just loathe env vars. They pop in and out of existence with no dicernable origin and no scope. I thought I was doing us a favor by adding an explicit config... but ok. I used the env var instead.


pkg/snet/dispatcher.go line 59 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: maybe revert this file and snet/packet_conn.go entirely to avoid any git blame and or merge issues.

done


tools/end2end/main.go line 17 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

A separate blaster may be a "simpler" solution to execute, but not to code, because end2end is already written.

The end2end tool grew by about 40% lines of code to include the packetflood. My gut feeling is that with a copy-pasted and stream-lined version of end2end that only does the "packetflood", we would end up with not much more code in total.
I don't like that this now conflates two mostly separate code paths just to share some of the boilerplate, but I think I don't feel strongly enough about this end2end test to object. Especially, as you already have plans to replace this. @scionproto/scion-core-team opinions?

Yeah, in terms of code quantity it ended-up about the same. It was just an easier (seeming) path to follow. Ideally I would just ditch this and replace it with the packet blaster approach, but something tells me it isn't going to be a 2 days job. The nitty-gritty is rather messy. I really don't want to drag this PR on forever. The test itself may not be great but the pattern is established and produces result. So I'd like to submit it. However I have no problem moving the packetflood game into its own binary. So...stay tuned for that one.


topology/router_bm.topo line 1 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit. For other tests that expect specific topology, we put this .topo file into the testdata, i.e. acceptance/router_benchmark/testdata/. Probably a good way to keep things tidy.

router_benchmark is like end2end_integration. You can tun it by hand outside the test framework. That's why I put the topo file in the same location. But yeah, I can move the file. Done

Specifically:
* Use the GOMAXPROCS environment variable instead of a setting config field
  for the routers.
* Move the router_bm.topo file to router_benchmark/test_data/
* Full restore of some unchanged files.
* Add an explicit dependency of end2end_integration to the end2end binary.
Copy link
Contributor Author

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 17 files reviewed, 2 unresolved discussions (waiting on @matzf)


doc/manuals/router.rst line 182 at r5 (raw file):

Previously, jiceatscion wrote…

Well, yes it occured to me that we could use the env var directly. I just loathe env vars. They pop in and out of existence with no dicernable origin and no scope. I thought I was doing us a favor by adding an explicit config... but ok. I used the env var instead.

done

Moved it to a sibling binary named end2endblast.
As a result end2end* do not take a "game" argument any more.
Copy link
Contributor Author

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 21 files reviewed, 1 unresolved discussion (waiting on @matzf)


tools/end2end/main.go line 17 at r2 (raw file):

Previously, jiceatscion wrote…

Yeah, in terms of code quantity it ended-up about the same. It was just an easier (seeming) path to follow. Ideally I would just ditch this and replace it with the packet blaster approach, but something tells me it isn't going to be a 2 days job. The nitty-gritty is rather messy. I really don't want to drag this PR on forever. The test itself may not be great but the pattern is established and produces result. So I'd like to submit it. However I have no problem moving the packetflood game into its own binary. So...stay tuned for that one.

done

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1, 7 of 14 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: 13 of 21 files reviewed, 5 unresolved discussions (waiting on @jiceatscion)


acceptance/router_benchmark/BUILD.bazel line 8 at r2 (raw file):

Previously, jiceatscion wrote…

mmm, end2end_integration might need to have bin/end2end as a data dependency too. It seems to be enough. Although I don't know precisely how bazel manages it.

For the record; as you found out, the end2end is baked into the tester docker image, that's how it ends up in the right place.


doc/manuals/router.rst line 182 at r5 (raw file):

Previously, jiceatscion wrote…

done

Ah, I see. I appreciate that you want to improve the configuration, sorry for rudely shooting it down!
As we cannot get rid of the environment variable even if we wanted, also exposing this as a configuration option means that the settings can conflict and one takes precedence, which, IMO, is even nastier than the environment variable alone. Hope this makes some sense.


doc/manuals/router.rst line 214 at r7 (raw file):

      The number of kernel threads that go creates depends on the number of usable cores, which is
      controlled by the environment variable ``GOMAXPROCS``. See :ref:`GOMAXPROCS <gomaxprocs>`.

Nit: use the :env: role, this directly creates a reference and the right formatting. Then we don't need the separate anchor _gomaxprocs.

Suggestion:

controlled by the environment variable :env:`GOMAXPROCS`.

tools/end2end/main.go line 17 at r2 (raw file):

Previously, jiceatscion wrote…

done

Ok, got it. Seems good to me.


tools/end2end/main.go line 360 at r7 (raw file):

}

func (c *client) ping(ctx context.Context, n int, path snet.Path, logIfOk bool) error {

Nit: remove the logIfOk (true here in end2end, false in end2endblast).


tools/end2endblast/main.go line 117 at r7 (raw file):

	flag.Var(timeout, "timeout", "The timeout for each attempt")
	flag.BoolVar(&epic, "epic", false, "Enable EPIC")
	flag.BoolVar(&traces, "traces", true, "Enable Jaeger traces")

Nit: as far as I understand, the traces as they are implemented here never make much sense for the blaster. I think it would make sense to remove this flag here and drop all the related code. In the end2end, on the other hand, we could revert to always enable this.


tools/end2endblast/main.go line 306 at r7 (raw file):

	defer c.sdConn.Close()
	c.errorPaths = make(map[snet.PathFingerprint]struct{})
	pong_out := make(chan int)

Nit: pongOut (also pingResult, pongResult, ...)


tools/end2endblast/main.go line 359 at r7 (raw file):

			return true
		}
		c.path = p // struct fields cannot be assigned with :=

Nit: in contrast to end2end, this does not switch to alternative paths on errors. We could clean this up by moving this getRemote call out, before the first call of the blindPing.
We can also remove everything related to errorPaths

Namely:
* Doc formating
* Simplify end2endblast now that its function is separate from that of end2end.
* Restore endend almost completely to its original form for the same reason.
Copy link
Contributor Author

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 21 files reviewed, 1 unresolved discussion (waiting on @matzf)


doc/manuals/router.rst line 182 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Ah, I see. I appreciate that you want to improve the configuration, sorry for rudely shooting it down!
As we cannot get rid of the environment variable even if we wanted, also exposing this as a configuration option means that the settings can conflict and one takes precedence, which, IMO, is even nastier than the environment variable alone. Hope this makes some sense.

Well, yeah. For the config approach to be an actual improvement, we would need to also suppress the effect of the env var (which we can, btw: call GOMAXPROC(NumCPU()) by default). But that's added mess for too little benefit. I do find that there are too many sources of configuration in general; not just for this one thing, so if it gets addressed it better be as a separate more comprehensive effort.


tools/end2end/main.go line 360 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: remove the logIfOk (true here in end2end, false in end2endblast).

done


tools/end2endblast/main.go line 117 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: as far as I understand, the traces as they are implemented here never make much sense for the blaster. I think it would make sense to remove this flag here and drop all the related code. In the end2end, on the other hand, we could revert to always enable this.

Done. Might as well now.


tools/end2endblast/main.go line 306 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: pongOut (also pingResult, pongResult, ...)

Done. Sorry for letting snakes in the camel pen.


tools/end2endblast/main.go line 359 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: in contrast to end2end, this does not switch to alternative paths on errors. We could clean this up by moving this getRemote call out, before the first call of the blindPing.
We can also remove everything related to errorPaths

Done... and then some.

Copy link
Contributor Author

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 21 files reviewed, all discussions resolved (waiting on @matzf)


doc/manuals/router.rst line 214 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: use the :env: role, this directly creates a reference and the right formatting. Then we don't need the separate anchor _gomaxprocs.

done

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1, 1 of 14 files at r6, 9 of 9 files at r8, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion)

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion)

@matzf matzf changed the title Add a router benchmark program and also use it as integration test testing: add router benchmark program and use it as integration test Nov 17, 2023
@matzf matzf merged commit 2136a63 into scionproto:master Nov 17, 2023
1 check passed
@jiceatscion jiceatscion deleted the router_benchmark branch December 7, 2023 10:33
juagargi pushed a commit to netsec-ethz/scion that referenced this pull request Mar 8, 2024
…cionproto#4437)

* Enhance the end2end_integration programs as follows:
  * end2end_integration has a slightly finer-grain role-based pair selection capability.
  * end2end_integration outputs an extra result line that provides precise start/stop timestamps for the whole test.
* process_metrics exposes the number of cores used by Go.
* Add  end2endblast. Similar to end2end but plays a game other than pingpong: packetblast. It keeps logging and tracing to a minimum. The server side responds to very few of the pings (1/256). Just enough to prove that the circuit works.
* Add an integration test called router_benchmark:
  * relies on an especially crafted topology: router_bm.topo
  * uses end2end_integration with command end2endblast over that topology as a load test for the router
  * extracts and logs relevant performance metrics after the test
  * compares performances with a set of expectations if executed by CI.

As a benchmark program, this is only a first approximation. Extending the end2end suite provided a quick solution but isn't ideal: it uses too many cores at once due to requiring 5 routers actively running for one of the tests. More work is in progress to replace this with some leaner setup where only one router is actually running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants